{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nteract": {
     "transient": {
      "deleting": false
     }
    }
   },
   "source": [
    "# Build your first SynapseML models\n",
    "This tutorial provides a brief introduction to SynapseML. In particular, we use SynapseML to create two different pipelines for sentiment analysis. The first pipeline combines a text featurization stage with LightGBM regression to predict ratings based on review text from a dataset containing book reviews from Amazon. The second pipeline shows how to use prebuilt models through the Azure AI Services to solve this problem without training data."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nteract": {
     "transient": {
      "deleting": false
     }
    }
   },
   "source": [
    "## Load a dataset\n",
    "Load your dataset and split it into train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "396e4834-0140-418b-8867-4a0e20c547d6",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "train, test = (\n",
    "    spark.read.parquet(\n",
    "        \"wasbs://publicwasb@mmlspark.blob.core.windows.net/BookReviewsFromAmazon10K.parquet\"\n",
    "    )\n",
    "    .limit(1000)\n",
    "    .cache()\n",
    "    .randomSplit([0.8, 0.2])\n",
    ")\n",
    "\n",
    "display(train)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "51b3c66a-3582-429a-a969-c2fb66e77c49",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Create the training pipeline\n",
    "Create a pipeline that featurizes data using `TextFeaturizer` from the `synapse.ml.featurize.text` library and derives a rating using the `LightGBMRegressor` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "df76cfec-a945-469c-a9df-47e9144e37eb",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "from pyspark.ml import Pipeline\n",
    "from synapse.ml.featurize.text import TextFeaturizer\n",
    "from synapse.ml.lightgbm import LightGBMRegressor\n",
    "\n",
    "model = Pipeline(\n",
    "    stages=[\n",
    "        TextFeaturizer(inputCol=\"text\", outputCol=\"features\"),\n",
    "        LightGBMRegressor(featuresCol=\"features\", labelCol=\"rating\"),\n",
    "    ]\n",
    ").fit(train)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "e0d23e7d-14c7-4a38-9983-fac290e97def",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Predict the output of the test data\n",
    "Call the `transform` function on the model to predict and display the output of the test data as a dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "70845164-e6df-4948-aa68-d8f4c5537eaa",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "display(model.transform(test))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "cbc5ddec-2984-4fea-b164-d1127f46f919",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Use Azure AI Services to transform data in one step\n",
    "Alternatively, for these kinds of tasks that have a prebuilt solution, you can use SynapseML's integration with Azure AI Services to transform your data in one step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from synapse.ml.services.language import AnalyzeText\n",
    "from synapse.ml.core.platform import find_secret\n",
    "\n",
    "model = AnalyzeText(\n",
    "    textCol=\"text\",\n",
    "    outputCol=\"sentiment\",\n",
    "    kind=\"SentimentAnalysis\",\n",
    "    subscriptionKey=find_secret(\n",
    "        secret_name=\"ai-services-api-key\", keyvault=\"mmlspark-build-keys\"\n",
    "    ),  # Replace the call to find_secret with your key as a python string.\n",
    ").setLocation(\"eastus\")\n",
    "\n",
    "display(model.transform(test))"
   ]
  }
 ],
 "metadata": {
  "description": null,
  "kernelspec": {
   "display_name": "Synapse PySpark",
   "name": "synapse_pyspark"
  },
  "language_info": {
   "name": "python"
  },
  "save_output": true,
  "synapse_widget": {
   "state": {},
   "version": "0.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
